Skip to content

Conversation

@shuningc
Copy link
Contributor

@shuningc shuningc commented Nov 24, 2025

Embedding Instrumentation

Tracks embedding generation events with input texts and output dimensions
Vendor detection system supporting 13+ providers:
OpenAI, Azure OpenAI, AWS Bedrock, Google (Vertex AI, Gemini)
Cohere, Anthropic, Hugging Face, Ollama
Voyage AI, Jina, Mistral, and more
Rule-based provider identification from class names
Batch and single embedding operation support
Infrastructure

LlamaindexInstrumentor class with automatic callback handler registration
LlamaindexCallbackHandler implementing LlamaIndex event handling
vendor_detection.py module with extensible VendorRule system
Integration with opentelemetry-util-genai for span and metrics emission
Comprehensive test coverage (338 lines of tests)
Testing
✅ LLM instrumentation: 187 lines of tests
✅ Embedding instrumentation: 151 lines of tests
✅ Tested with OpenAI embeddings API
✅ All linting and formatting checks pass

- Add embedding event handlers (_handle_embedding_start, _handle_embedding_end)
- Extract model name, input texts, and dimension count from embedding events
- Create vendor_detection.py module with VendorRule-based provider detection
- Support 13+ embedding providers (OpenAI, Azure, AWS, Google, Cohere, etc.)
- Add test_embedding_instrumentation.py with single and batch embedding tests
- Update README with embedding documentation and provider list
- Tested successfully with OpenAI embeddings API
@shuningc shuningc requested review from a team as code owners November 24, 2025 20:55
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant